Homepage

Combining ADDA with Deep CORAL: Unsupervised Domain Adaptation for Image Classification

Network
An illustration of our proposed method combining Deep Coral and ADDA. Blue and orange arrows denote data flows of source and target domain respectively. Blue encoder and classifier are pretrained and fixed.

Abstract

Unsupervised domain adaptation techniques are essential for image classification tasks in the real world. As the domain of images, or the space of all possible images, is so enormous that models trained on any dataset will inevitably suffer from out of domain issues. One promising research direction is to use domain adaptation methods to adapt models trained on source domain to the target domain. Adversarial Discriminative Domain Adaptation (ADDA) is one typical adversarial learning based unsupervised domain adaptation method. Though it is proved to be effective on simple and small datasets, it requires sophisticated training strateies and is hard to converge at times. We propose to force align the distribution of the model’s output with that of an adapted model, which also serves as the initialization for the adversarial training. In this way, the adversarial process will be forced to search within a space with results at least as good as the initialization. Experiments on our proposed Tiny-16-Class-Imagenet show our method is effective and efficient in terms of accuracies and training time.

Introduction

Background

By generalizability, we refer to the model’s ability to perform equally well on unseen data. The word, “domain”, in this article denotes the space of input features XX and the marginal distribution P(X)P(X). Specifically, for image classification tasks, the domain of training dataset is the set of all possible images and the marginal distribution in this dataset 6. It is crucuial for models to be generalizable doing image classification tasks as the space of possible images is too big that any dataset can only capture one small fraction of it and if the model fails to generalize, then it is useless. Domain shift refers two domains being different, which is common. For example, when using a model trained with images taken in daylight, but used with images taken at night. Unsurprisingly, the model usually fails. Different patterns of perturbations like noises imposed on images are another souce of domain shift. To solve the problem of domain shift, one promising research area is domain adaptation, which aims to adapt a model trained on source domain to the target domain. In this project, we investigate the unsupervised domain adaptation problem, which does not require the target domain to be labeled.

Extensive domain adaptation algorithms have been proposed to account for the degradation in performance due to domain shift. Deep Coral 4 extends the unsupervised domain adaption method Coral to learn a nonlinear transformation that is able to align correlations of layer activations in deep neural networks. Adversarial Discriminative Domain Adaptation (ADDA) 5 combines discriminative model and generative adversarial networks to learn a discriminative mapping by fooling a domain discriminator.

Method

Datasets

Sample noises in the **Tiny-16-Class-ImageNet** dataset Figure 2: Sample noises in the Tiny-16-Class-ImageNet dataset. Top row from left to right: No noise, uniform noise, salt-and-pepper noise. Bottom row from left to right: rotation, high-pass, low-pass. Image manipulations follow the procedure in 1.

We conduct experiments on two datasets: Tiny-16-Class-ImageNet and MNIST-USPS2,^,3. Most experiments are done on the Tiny-16-Class- ImageNet, which is self-produced following guidelines in 1. The Tiny-16-Class-ImageNet has three subsets: training set, validation set and test set, each containing 10015, 1269 and 10350 images respectively. All three subsets have 16 general classes (like bear rather than brown bear), but with different domains. Training and validation sets contain samples of different sub-classes (brown bear vs black bear). We apply different patterns of noises to generate different domains. Sample noises are illustrated in figure 2. Test set contains all samples from every sub-classes (brown bear, black bear, etc). We have also tested our proposed method on MNIST-USPS dataset.

Deep Coral

We adapt the idea of Deep Coral 4 to simply align second-order statistics in the last layer of the backbone network by adding a coral loss. This method is simple yet effective and is very extensible. We replace the backbone of the Deep Coral with ResNet-50 pretrained on ImageNet when doing experiments on the Tiny-16-Class-ImageNet. We use the same SGD hyper-parametsers as in 4 The λ\lambda controlling the weight of the coral loss is set the same with 4, except on MNIST-USPS dataset, where we set λ=1epochnum_epochs\lambda = 1- \frac{epoch}{num\_epochs}.

ADDA

We also adopt the idea of ADDA by first learning a discriminative representation using data from the source domain and then learning another encoding that maps the target domain to the source domain with a domain-adversarial loss. We use ResNet-50 (excluding the last layer) as the backbone for encoder and a 33 layer MLP as the discriminator with hidden size of 10241024. The pretrained ResNet-50 will be freezed during adversarial training. Adam is used as the optimizer with β1=0.5\beta_1=0.5 and β2=0.999\beta_2=0.999. The learning rate is set to be 0.00020.0002 and the batch size is 3232. During the adaption stage, target encoder is updated every 44 steps.

ADDA-CORAL

We propose a new method that combines the Deep Coral and the ADDA methods, by using Deep Coral as the pretraining of the ADDA, and aligning the target domains’ second order statistics between the classification outputs of the fixed pretrained encoder and the ADDA trained target encoder. The overall architecture is illustrated in Fig.1. During experiments, we find that vanilla ADDA ruins the pretrained encoder due to the poorly trained discriminator. To better use the initialization of the Deep Coral pretrained encoder while ensuring the target encoder learned will generate similar features for target and source domain, we use coral loss to only align the ADDA trained encoder’s classification output with that of the fixed pretrained encoder, and gradually decrease the coral loss’s weight.

The underlying assumption we made here is that we assume the best possible solution lies near (with respect to learning using Adam) to the already good initialization in the solution space.

Results

Table 1: Our Deep Coral+ADDA’s results on Tiny-16-Class-ImageNet and MINIST-USPS.

Setting
Source
target
Acc
ResNet-50 train val^\dag 25.13%25.13\%
ADDA train val^\dag 48.32%48.32\%
Deep Coral train val^\dag 73.52%73.52\%
Ours train val^\dag 77.69%77.69\%
LeNet MINST USPS 25.13%25.13\%
ADDA MINST USPS 89.40%89.40\%
Deep Coral MINST USPS 54.30%54.30\%
Ours MINST USPS 94.56%94.56\%

^\dag: validation set with uniform noise (0.5)

Table 2: Our Deep Coral+ADDA’s results on unseen test set of Tiny-16-Class-ImageNet.

Setting
Train Source
Train target
Unseen Target Acc
ResNet-50 train None Test^\dag 5.34%5.34\%
ResNet-50-ImageNet train None Test^\dag 13.14%13.14\%
DeepCoral train val^\dag Test^\dag 38.37%38.37\%
Ours train val^\dag Test^\dag 52.96%52.96\%

^\dag: validation set with uniform noise (0.5)

Confusion matrix of our results on different target domains Figrue 3: Classification accuracy in percent for different domains. Model M0M_0 is only trained on the source domain. Models M1M_1 to M5M_5 are adapted on one target domain (in red rectangle) via ADDA. M6M_6 to M10M_10 are similar except with Deep Coral. Best results for each domain and method are bold in blue.

Discussion

Experiment results in Figure 3 shows ADDA and Deep Coral’s improvements on the target domain. Deep Coral generally outperform ADDA by a large margin except on the High-Pass target domain. The failure on this domain is mostly likely due to the drastic domain shift between High-Pass and others, as illustrated in Figure 2 in the dataset section. Deep Coral has better generalizability to unseen domains. It’s most likely because Deep Coral doesn’t alter the encoder much and the encoder is pretrained on the ImageNet (though without any added noises).

Table 1 shows our proposed Deep Coral+ADDA’s results on the Tiny-16-Class-ImageNet and MNIST-USPS. We added uniform noise (0.5) to the validation set making the domain shift to the training set even larger and the domain adaptation task even harder. The high performance and concrete improvements of our Deep Coral+ADDA method over other settings validate the effectiveness of our novel modifications and designs. We also test our method on unseen and untrained target domain and observe a significantly better results as shown in Table 2 in the appendix.


  1. Robert Geirhos, Carlos R Medina Temme, Jonas Rauber, Heiko H Schutt, Matthias Bethge, and Felix A Wichmann. Generalisation in humans and deep neural networks. arXiv preprint arXiv:1808.08750, 2018.

  2. Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Ha ner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.

  3. Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, BoWu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. 2011.

  4. Baochen Sun and Kate Saenko. Deep coral: Correlation alignment for deep domain adaptation. In European conference on computer vision, pages 443-450. Springer, 2016.

  5. Eric Tzeng, Judy Ho man, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. In Proceedings of the IEEE con- ference on computer vision and pattern recogni- tion, pages 7167-7176, 2017.

  6. Wang, Mei, and Weihong Deng. “Deep visual domain adaptation: A survey.” Neurocomputing 312 (2018): 135-153.